Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gtneg mul divoptimizations #45604

Merged
merged 19 commits into from
Jan 12, 2021

Conversation

alexcovington
Copy link
Contributor

Extends the JIT to optimize the cases outlined in #44375.

I used the following test case:

using System;
using System.Runtime.CompilerServices;

namespace Scratch
{
    public class Program
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        static int M5(int a) => -a / 7;  // optimize to  "a / -7"

        [MethodImpl(MethodImplOptions.NoInlining)]
        static int M6(int a) => -(a / 7); // optimize to  "a / -7"

        [MethodImpl(MethodImplOptions.NoInlining)]
        static int M7(int a) => -a * 7;  // optimize to  "a * -7"

        [MethodImpl(MethodImplOptions.NoInlining)]
        static int M8(int a) => -(a * 7); // optimize to  "a * -7"

        public static void Main(string[] args) {
            Console.WriteLine("{0} {1} {2} {3}", M5(10), M6(10), M7(10), M8(10));
        }
    }
}

jit-diff output:

PS C:\Users\acovingt\source\repos\Scratch> jit-diff diff --base C:\Users\acovingt\source\repos\runtime-master\artifacts\bin\coreclr\windows.x64.Debug\ --diff C:\Users\acovingt\source\repos\runtime\artifacts\bin\coreclr\windows.x64.Debug\ --core_root C:\Users\acovingt\source\repos\runtime-master\artifacts\tests\coreclr\windows.x64.Debug\Tests\Core_Root\ --assembly .\bin\Debug\netcoreapp5.0\Scratch.dll --output C:\Users\acovingt\Documents\diffs\
Beginning Crossgen CodeSize Diffs for .\bin\Debug\netcoreapp5.0\Scratch.dll
- Finished 1/1 Base 1/1 Diff [2.9 sec]
Completed Crossgen CodeSize Diffs for .\bin\Debug\netcoreapp5.0\Scratch.dll in 2.94s
Diffs (if any) can be viewed by comparing: C:\Users\acovingt\Documents\diffs\dasmset_16\base C:\Users\acovingt\Documents\diffs\dasmset_16\diff

git diff --no-index --diff-filter=M --exit-code --numstat C:\Users\acovingt\Documents\diffs\dasmset_16\diff C:\Users\acovingt\Documents\diffs\dasmset_16\base

Analyzing CodeSize diffs...
Found 1 files with textual diffs.

Crossgen CodeSize Diffs for .\bin\Debug\netcoreapp5.0\Scratch.dll for  default jit

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 431
Total bytes of diff: 421
Total bytes of delta: -10 (-2.32% of base)
    diff is an improvement.

Top file improvements (bytes):
         -10 : Scratch.dasm (-2.32% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-23.53% of base) : Scratch.dasm - Scratch.Program:M7(int):int
          -2 (-9.09% of base) : Scratch.dasm - Scratch.Program:M5(int):int
          -2 (-9.09% of base) : Scratch.dasm - Scratch.Program:M6(int):int
          -2 (-13.33% of base) : Scratch.dasm - Scratch.Program:M8(int):int

Top method improvements (percentages):
          -4 (-23.53% of base) : Scratch.dasm - Scratch.Program:M7(int):int
          -2 (-13.33% of base) : Scratch.dasm - Scratch.Program:M8(int):int
          -2 (-9.09% of base) : Scratch.dasm - Scratch.Program:M5(int):int
          -2 (-9.09% of base) : Scratch.dasm - Scratch.Program:M6(int):int

4 total methods with Code Size differences (4 improved, 0 regressed), 2 unchanged.
Completed analysis in 0.65s

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 4, 2020
@EgorBo
Copy link
Member

EgorBo commented Dec 5, 2020

I think you need to perform this optimization only if tree->gtOverflow is false, e.g.:

checked
{
    int x = -y * int.MinValue;
}

Also, I mentioned #44266 in the issue - you need to make sure the constant on the right is not ->IsIconHandle()

@EgorBo
Copy link
Member

EgorBo commented Dec 5, 2020

However, even unchecked context won't save you from this case:

class Program
{
    static async Task Main()
    {
        int x1 = -GetValue() / 1;  // OK
        int x2 =  GetValue() / -1; // throws OverflowException even in the unchecked context
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int GetValue() => int.MinValue;
}

So I'd give up if op2 is 1, %type%.MinValue (and, perhaps -1 and 0) + tree->gtOverflow is true

case GT_MUL:
// -a * C => a * -C, where C is constant
// MUL(NEG(a), C) => MUL(a, NEG(C))
if (op1->OperIs(GT_NEG) && !op1->IsCnsIntOrI() && op2->IsCnsIntOrI())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!op1->IsCnsIntOrI() - I think you meant op1->gtOp1()->IsCnsIntOrI() here.

// op1: a
// op2: NEG
// op2Child: C
op1 = op1->AsOp()->gtOp1; // a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rewrite to be:

    tree->AsOp()->gtOp1 = op1->gtGetOp1();
    DEBUG_DESTROY_NODE(op1);
    op2->AsIntCon()->SetIconValue(-op2->AsIntCon()->IconValue());
    return tree;

@ViktorHofer
Copy link
Member

// Auto-generated message

69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts.

To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others.

@alexcovington
Copy link
Contributor Author

@EgorBo Thanks for the feedback. I have these changes made on my dev machine, but the machine is having some issues at the moment. I expect to have these updated by tomorrow. Sorry for the delay!

@sandreenko sandreenko self-requested a review December 28, 2020 21:33
@sandreenko
Copy link
Contributor

Hi @alexcovington, thanks for your PR and sorry for the long review.
Could you please resolve the merge conflicts and run jit-format on your PR when you have time and ping @dotnet/jit-contrib after that.

@alexcovington
Copy link
Contributor Author

Pinging @dotnet/jit-contrib

@sandreenko sandreenko removed their request for review January 5, 2021 00:19
@kunalspathak
Copy link
Member

@alexcovington - Looks like there are multiple failures when crossgening System.Private.Corelib.dll. Did you get chance to take a look?

Generating native image of System.Private.CoreLib for windows.x64.Checked. Logging to F:\workspace\_work\1\s\artifacts\log\CrossgenCoreLib_windows__x64__Checked.log
  F:\workspace\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen.exe /nologo  /Platform_Assemblies_Paths "F:\workspace\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL" /out "F:\workspace\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll" "F:\workspace\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll"
  Extra flags on tree [000003]: -C---------
                 [000003] --C---------              *  CAST      int <- long
                 [000014] ---------U--              \--*  CAST      long <- ulong <- byref
                 [000000] ------------                 \--*  LCL_VAR   byref  V00 arg0         
  
  Assert failure(PID 4516 [0x000011a4], Thread: 7996 [0x1f3c]): Assertion failed '!"Extra flags on tree"' in 'System.SpanHelpers:UnalignedCountVector128(byref):long' during 'Morph - Global' (IL size 21)
  
      File: F:\workspace\_work\1\s\src\coreclr\jit\flowgraph.cpp Line: 22230
      Image: F:\workspace\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen.exe
  

@alexcovington
Copy link
Contributor Author

@kunalspathak Sorry, I started looking at this, but it fell of my radar for a little bit.

I'm a little stuck on the Extra flags on tree error, not exactly sure where the flags are added/how to remove them. Any advice would be greatly appreciated!

tree->AsOp()->gtOp1 = op1->gtGetOp1();
DEBUG_DESTROY_NODE(op1);
op2->AsIntCon()->SetIconValue(-op2->AsIntCon()->IconValue()); // -C
return tree;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the cause of the problem. You return the tree immediately but if you see after the switch-case, we update the tree flags. In your case, when we hit the assert, because GTF_CALL is not yet removed from the node because here, we return immediately. I believe same applies to the GT_MUL case.

I am not too familiar with the morph and probably @sandreenko should confirm.

@kunalspathak
Copy link
Member

Changes LGTM. Is it possible to get the libraries pmi code size diff?

* A GenTree* which points to the new tree. If the tree is not for simd intrinsic,
* return nullptr.
*/
* If a read operation tries to access simd struct field, then transform the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did jit-format cause these changes or were these unintentional edits? Could you please try to delete them and run jit-format again?

@@ -11909,6 +11909,22 @@ GenTree* Compiler::fgMorphSmpOp(GenTree* tree, MorphAddrContext* mac)
return fgMorphCast(tree);

case GT_MUL:
// -a * C => a * -C, where C is constant
// MUL(NEG(a), C) => MUL(a, NEG(C))
if (fgGlobalMorph && op1->OperIs(GT_NEG) && !op1->gtGetOp1()->IsCnsIntOrI() && op2->IsCnsIntOrI() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts.OptimizationEnabled() check is missed here and in the other cases, when this flag is set to false we don't want to do any CQ transformations, only what is needed for correctness, it is used, for example, in minopts.

// -a * C => a * -C, where C is constant
// MUL(NEG(a), C) => MUL(a, NEG(C))
if (fgGlobalMorph && op1->OperIs(GT_NEG) && !op1->gtGetOp1()->IsCnsIntOrI() && op2->IsCnsIntOrI() &&
!op2->IsIconHandle() && op2->AsIntCon()->IconValue() != 1 && op2->AsIntCon()->IconValue() != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check fgGlobalMorph here? Did you want to use !gtIsActiveCSE_Candidate ? gtIsActiveCSE_Candidate means it is not CSE phase (it could be a global morph phase instead) or this tree is not the current candidate that could be dangerous to remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked fgGlobalMorph because it was checked in #43921, which is what I based most of this work on. I can use !gtIsActiveCSE_Candidate if that would be better.

Still learning about the CoreCLR JIT, I greatly appreciate the advice :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought I think fgGlobalMorph check is fine here because you are changing node value without updating its VN.
An alternative could be to do the optimization with !gtIsActiveCSE_Candidate but create a new tree for the constant:

tree->AsOp()->gtOp2 = gtNewIconNode(-op2->AsIntCon()->IconValue(), op2->TypeGet());
DEBUG_DESTROY_NODE(op2);

it would allow the optimization to happen in other phases, so I think it would be better.

// -a * C => a * -C, where C is constant
// MUL(NEG(a), C) => MUL(a, NEG(C))
if (fgGlobalMorph && op1->OperIs(GT_NEG) && !op1->gtGetOp1()->IsCnsIntOrI() && op2->IsCnsIntOrI() &&
!op2->IsIconHandle() && op2->AsIntCon()->IconValue() != 1 && op2->AsIntCon()->IconValue() != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought I think fgGlobalMorph check is fine here because you are changing node value without updating its VN.
An alternative could be to do the optimization with !gtIsActiveCSE_Candidate but create a new tree for the constant:

tree->AsOp()->gtOp2 = gtNewIconNode(-op2->AsIntCon()->IconValue(), op2->TypeGet());
DEBUG_DESTROY_NODE(op2);

it would allow the optimization to happen in other phases, so I think it would be better.

!op2->IsIconHandle() && op2->AsIntCon()->IconValue() != 1 && op2->AsIntCon()->IconValue() != 0 &&
op2->AsIntCon()->IconValue() != -1 && !tree->gtOverflow())
{
// tree: MUL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the comment for the tree before the transformation?
then it should be

// tree: MUL
// op1: Neg
// op1Child: a
// op2: Const

{
// tree: DIV
// op1: a
// op2: NEG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here.

alexcovington and others added 2 commits January 7, 2021 15:12
Co-authored-by: Sergey Andreenko <seandree@microsoft.com>
@sandreenko
Copy link
Contributor

Looks good, @alexcovington, do you mind if I push a small change to your branch for this PR? The checks for bad cases that @EgorBo pointed out should be a little bit different. Also, I think it is worth a new test (probably we should place it to https://github.com/dotnet/runtime/tree/master/src/tests/JIT/opt/InstructionCombining) that I expect will behave differently on arm/arm64 so I don't want to bother you with it (for your first PR 😄).

@alexcovington
Copy link
Contributor Author

@sandreenko Absolutely! Just let me know if you need anything else from me 😄.

@sandreenko
Copy link
Contributor

1 improvement in SPC:

Top method improvements (bytes):
          -2 (-5.71% of base) : System.Private.CoreLib.dasm - SpanHelpers:UnalignedCountVector128(byref):long (2 methods)

expected improvements on the repro test:

Top method regressions (bytes):
           1 ( 5.56% of base) : NegMulOrDivToConst.dasm - Program:Div2_3(int):long

Top method improvements (bytes):
          -6 (-23.08% of base) : NegMulOrDivToConst.dasm - Program:Div1_4(long):long
          -4 (-50.00% of base) : NegMulOrDivToConst.dasm - Program:Mul1_1(int):int
          -4 (-40.00% of base) : NegMulOrDivToConst.dasm - Program:Mul1_3(int):int
          -4 (-17.39% of base) : NegMulOrDivToConst.dasm - Program:Div1_1(int):int
          -3 (-8.82% of base) : NegMulOrDivToConst.dasm - Program:Div2_2(long):long
          -3 (-13.04% of base) : NegMulOrDivToConst.dasm - Program:Div2_4(long):long
          -2 (-22.22% of base) : NegMulOrDivToConst.dasm - Program:Mul1_5(int):int
          -2 (-33.33% of base) : NegMulOrDivToConst.dasm - Program:Mul2_1(int):int
          -2 (-9.52% of base) : NegMulOrDivToConst.dasm - Program:Div2_1(int):int

the regression is from -(a / int.MinValue); that could be optimized by lower better,

// If the divisor is the minimum representable integer value then we can use a compare,
// the result is 1 iff the dividend equals divisor.
divMod->SetOper(GT_EQ);

but it is not worth checking for it in my opinion.

Also, I decided to use !optValnumCSE_phase instead of !gtIsActiveCSE_Candidate(tree) because too many trees were changing their values during these transformations and checking them all was too messy.

@alexcovington
Copy link
Contributor Author

@sandreenko Thanks for the help, this is very informative as well! I'll keep it in mind for future PRs.

static int CheckMulNeg()
{
bool fail = false;
if (Mul1_1(3) != -7 * 3)
Copy link
Member

@kunalspathak kunalspathak Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: can we make the method names Mul_1 and other more meaningful like Mul_Minus7, etc.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sandreenko ! Should be good to go.

@sandreenko sandreenko merged commit f9cf601 into dotnet:master Jan 12, 2021
@sandreenko
Copy link
Contributor

Thanks @alexcovington for the PR!

@ghost ghost locked as resolved and limited conversation to collaborators Feb 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants